The dataset that this report is based on is called “Teen Relationship Survey Pretest”, obtained from the URL:http://www.pewinternet.org/datasets/sep-25-oct-9-2014-and-feb-10-march-6-2015-teens/
This survey is a KnowledgePanel research study conducted by the Pew Research Center and focused on asked questions on both an adult and his/her child aged 13 through 17.
With the teen specifically, the survey touched on a variety of topics related to teen friendships and romantic relationships, particularly the role technology (such as mobile phones, Facebook, etc.) plays in those relationships. On the other hand, with the parent, the survey asked about technology, and the ways the child and the parent use a variety of digital devices and social media platforms.
The dataset has 1642 entries (number of parents/kids who completed the survey) and 317 variables (answers of each question).
In the r script or rmd code, one can find the full code for cleaning the original dataset. In short, here are the several things which were done to eliminate invalid entries.
The dataset had a number of questions that were posed to parents, and a number of similar questions posed to their teens.
A number of hypotheses were investigated using this dataset, to see if there were any relationships between parent’s behaviours and the teen’s behaviours with respect to social media and technology usage.
For example, parents were asked if they monitor their child’s location (could be considered as a form of stalking behaviour), and whether the teen monitored someone they were dating or had dated (similar stalking behaviour). This was an opportunity to investigate if teens were more likely to exhibit stalking behaviour if their parent did the same too.
There were also questions on, for example, whether the parent blocks their child’s phone usage and whether the child sends sexy/flirty messages. One of the hypotheses was thus whether the child would be more likely to send sexy/flirty messages if the parent blocks their child’s phone usage. This could suggest that the parent’s intervention might have this effect of triggering a rebellious behaviour (e.g. the stricter the parent, the more the child wants to be liberal).
Methodology
As the dependent variables for most of the investigations were categorical variables (being responses of survey questions), logistic regression was used for regression analysis instead of linear regression. Linear regression would not have made sense as its coefficients would not translate into anything meaningful. Dependent variables where the possible responses were “No” or “Yes”, i.e. binary 0 or 1 response, were chosen for analysis.
For logistic regression, the function glm was used, model <- glm(y ~ x1 + x2 + .. + xk, data = dataframe, family = “binomial”).
In most cases, the independent variables were also survey responses that could have more than 2 options. e.g. 1.Yes, a lot, 2.Yes, a little, 3.No. Researching literature online, there were two possible approaches: (1) leave such variables as numeric (R would interpret them as continuous variables), or convert them as factors (R would interpret them as categorical variables, and regression would yield a coefficient for each possible option by creating something like dummy variables). Setting these variables as continuous variables might be contentious as the values of the variables were discrete in nature, and one could not necessarily say that they the possible values were equally spaced. It might be difficult to explain why one option had value 1, and not 2 or 50 etc. Hence on balance, these variables were converted to factor, even though this would increase the total number of variables used in the regression and inflate the R2 value (AIC value for logistic regression). All dependent and independent variables were reordered so that the higher the value, the higher the scale of the response. i.e. instead of 1 for Yes and 2 for No, it was reordered to 0 for No and 1 for Yes.
It’s also important to note here that the models based on logistic regression had a different way of interpreting the coefficients
The 5 questions, their corresponding hypotheses, and their associated findings are summarised in the section below.
Effects of Parent and child behaviour, on teen’s use of social media
The table below summarises the questions studied, the initial hypothesis, and findings from the data.
| Question | Hypothesis | Findings |
|---|---|---|
| 1. Relationship between parents blocking their child’s phone usage and their child’s online behaviour on sexy/flirty pictures | If teens are rebellious, then the stricter the parent, the more the child wants to be liberal and may be more inclined to send flirty pictures | The regression results suggest that if parents took away their child’s cell phone or internet privileges as punishment, the teen is 169% more likely to send sexy or flirty pictures or videos of themselves. There is no statistically significant effect of other parent’s strict behaviours: parent using parental controls to restrict child’s use of his/her cell phone, and limiting the amount of time or times of day when child can go online |
| 2. Relationship on whether children self-censor their posts if they are friends with parents on social media | Teens whose parents are friends with them on social media will likely not post details of their relationship online | There was no statiscally significant evidence to show that whether the parent was connected to their child on social media had an impact on child posting public affection for their significant other (proxy of whether child self-censors their posts). Upon adding more variables to the regression, the regression suggests that teens feeling pressure to only post content that makes them look good to others, were around 90% (“Yes, a little pressure”) to 214% (“Yes, a lot pressure”) more likely to post public affection towards his/her significant other. There is no significant statistical evidence to show that various teen’s parent’s behaviours (connecting with child on social media, being friends on Facebook, and checking child’s social media profile) have an effect on teen’s public affection behaviour. This seems to suggest that teens are more influenced by their self-perception, rather than their parent’s interventions, with respect to whether or not they post public affection towards his/her significant other. |
| 3. Relationship between the parent using internet / social media themselves vs. them talking to their child about inappropriate online behavior | Parents are likely to be in a better position to advise their children on online behavior if they themselves use internet or are on social media | …. |
| 4. Relationship between stalking behaviour of parent to child and stalking behaviour of child to his/her boyfriend/girlfriend | If parents stalk teen, teen is more likely to inherit the stalking behaviour and stalk his/her boyfriend/girlfriend | It seemed that if parent monitors their child’s location (parent’s stalking behaviour), teens are around 190% more likely (at 90% confidence level) to access the phone of someone they were dating or used to date (child’s stalking behaviour). It seemed that if parent monitors their child’s location (parent’s stalking behaviour), teens are around 567% more likely (at 90% confidence level) to track the location of someone they were dating or used to date (child’s stalking behaviour). In both cases, parent monitoring child’s location seemed correlated with both child’s stalking behaviours (accessing phone and tracking GPS location). |
| 5. How more likely are children to trust their significant others and not have the urge to constantly monitor their activities if their parents are more trusting of them? | Children are likely to have healthier relationships and not have insecurities regarding their significant others if they are trusted by their parents | …. |
Question: Relationship between parents blocking their child’s phone usage and their child’s online behaviour on sexy/flirty pictures
Hypothesis: The stricter the parent, the more the child wants to be liberal.
Background info - data cleaning and manipulation
For this question, the relevant dataset questions are:
[Dependent variable]:
KDATE2_G: Have you ever done any of these things to let someone know you were attracted to them or interested in them? Have you sent them sexy or flirty pictures or videos of yourself?
1.Yes
2.No
[Independent variables]:
P14_F: Have you ever used parental controls to restrict your child’s use of his/her cell phone?
1.Yes
2.No
3.Does Not Apply
P13_D: Have you ever taken away your child’s cell phone or internet privileges as punishment?
1.Yes
2.No
3.Does Not Apply
P13_E: Have you ever limited the amount of time or times of day when your child can go online?
1.Yes
2.No
3.Does Not Apply
Data cleaning
Rows with invalid responses were removed by filling them with NA (e.g. if respondents were supposed to choose only options 1 or 2, but the data showed 3 or -1, these would be invalid responses).
The order of the options were reversed, so that the higher number represents the “most” choice, and 0 for No or None. E.g. for if the question’s original order of choices that respondents could choose from were:
1.Yes, 2.No
The responses’ order were re-ordered to:
0.No, 1. Yes
For questions with options as Yes/No/Does not apply, for respondents who responded Does not apply, the data record was removed (by filling it with NA), and order No as 0 and Yes as 1.
## after data cleaning, number of valid data records for analysis is: 830
Findings
The regression between whether teen sends flirty messages, and the strictness of parents: (i) by controlling cell phone, (ii) taking away privileges, (iii) limiting time spent online, and (iv) age, is shown below.
##
## Call:
## glm(formula = y ~ parent_controls_cell_phone + take_away_privileges +
## limit_time_online + age, family = "binomial", data = na.omit(data_specific))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6945 -0.4843 -0.3711 -0.3008 2.5599
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.95810 1.59326 -5.622 1.88e-08 ***
## parent_controls_cell_phone1 -0.10125 0.32216 -0.314 0.75330
## take_away_privileges1 0.98933 0.32537 3.041 0.00236 **
## limit_time_online1 -0.16482 0.26913 -0.612 0.54026
## age 0.39233 0.09717 4.038 5.40e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 508.20 on 829 degrees of freedom
## Residual deviance: 480.41 on 825 degrees of freedom
## AIC: 490.41
##
## Number of Fisher Scoring iterations: 5
## (Intercept) parent_controls_cell_phone1
## 0.0001286901 0.9037054546
## take_away_privileges1 limit_time_online1
## 2.6894332059 0.8480453622
## age
## 1.4804232140
The take_away_privileges variable is statistically significant with 99% confidence level.
The regression results suggest that if parents took away their child’s cell phone or internet privileges as punishment, the teen is 169% more likely to send sexy or flirty pictures or videos of themselves. There is no statistically significant effect of parent using parental controls to restrict child’s use of his/her cell phone, and limiting the amount of time or times of day when child can go online.
Given that take_away_privileges was significant, the results suggest that parent’s strictness behaviour did affect teen’s behaviour on social media. However, to be more conclusive, more aspects of parental strict behaviour, and more teen’s possible rebellious behaviour, could be looked at if time permitted, and if more data was available. For example, there was a lack of data on other proxies of rebellious behaviour or desire to be more liberal due to strictness of parent, besides sending flirty messages. Perhaps the survey could have included questions on whether the child had found alternative ways to access social media, such as using their friend’s internet device or lie to parents that they were in school studying but were actually using a library computer to use social media. With more data, the investigation could be more comprehensive to determine whether parents being strict had made the situation worse than if parents did not intervene.
Question: Relationship on whether children self-censor their posts if they are friends with parents on social media.
Hypothesis: Teens whose parents are friends with them on social media will likely not post details of their relationship online.
Background info - data cleaning and manipulation
For this question, the relevant dataset questions are:
[Dependent variable]:
KRSNS3_C: When you use social media do you ever tell your boyfriend, girlfriend or significant other how much you like them in a way that other people can see?
1.Yes
2.No
[Independent variables]:
P10: Are you connected with your child on any social media sites?
1.Yes
2.No
Similar data cleaning and manipulation operations were performed to get rid of invalid responses and to re-order values according from least to most.
## after data cleaning, number of valid data records for analysis is: 313
Findings
##
## Pearson's product-moment correlation
##
## data: data_specific$parent_connect_child and data_specific$teen_public_affection
## t = 1.0954, df = 311, p-value = 0.2742
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.04920595 0.17167423
## sample estimates:
## cor
## 0.06199316
As the p-value is 0.2742, there is no statistically significant evidence to show that the two variables are correlated.
##
## Call:
## glm(formula = teen_public_affection ~ parent_connect_child +
## age, family = "binomial", data = na.omit(data_specific))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.138 -1.011 -1.010 1.353 1.354
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3955314 1.3493661 -0.293 0.769
## parent_connect_child1 0.3098837 0.2857402 1.084 0.278
## age -0.0006381 0.0862793 -0.007 0.994
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 424.89 on 312 degrees of freedom
## Residual deviance: 423.70 on 310 degrees of freedom
## AIC: 429.7
##
## Number of Fisher Scoring iterations: 4
## (Intercept) parent_connect_child1 age
## 0.6733221 1.3632666 0.9993621
The factor of parent connecting to child is not statistically significant.
Further examination of the data:
It seems that when the parent is connected to the child (option 1 / “Yes”), the proportion of teens who express their affection for their significant other publicly, is higher.
A further a proportion test is done:
prob_table <- table(data_specific$parent_connect_child, data_specific$teen_public_affection)
# inverse the columns, as success is option 2 for teen_public_affection
prob_table <- cbind(prob_table[, 2], prob_table[, 1])
prop.test(prob_table)
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: prob_table
## X-squared = 0.90961, df = 1, p-value = 0.3402
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.22359377 0.07121282
## sample estimates:
## prop 1 prop 2
## 0.4000000 0.4761905
The p-value = 0.3402, which suggests that there is no statistically significant evidence to show that the two proportions differ. i.e. similar to the correlation tests, it suggests that the parent social media connection with the child, does not have significant effect on whether the teen shows public affection for significant other on social media.
Given these intial results, more variables which may relate to the hypothesis, were added to the regression model:
KFSNS1_E: In general, does social media make you feel pressure to only post content that makes you look good to others? 1.Yes, a lot,2.Yes, a little,3.No
To test if child’s self perception is more strongly correlated to his/her behaviour regarding posting public affection for significant other
P8: Are you friends with your child on Facebook? 1.Yes, 2.No
To see if parent being friends on Facebook (compared to a more generic social media platform as asked by question P10), is a significant factor
P13_C: Have you ever checked your child’s profile on a social networking site? 1(YES)-2(NO)-3(does Not Apply)
To see if parent checking child’s profile is a significant factor. Perhaps if the child knows that his/her parent checks his/her profile, he/she will be more restrained in his/her post.
## after data cleaning, number of valid data records for analysis is: 223
##
## Call:
## glm(formula = teen_public_affection ~ parent_connect_child +
## age + feel_pressure + friends_facebook + check_child_profile,
## family = "binomial", data = na.omit(data_specific))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5361 -0.9536 -0.8478 1.1872 1.5624
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.03173 1.87908 0.549 0.58296
## parent_connect_child1 0.11567 0.36829 0.314 0.75346
## age -0.10619 0.11194 -0.949 0.34278
## feel_pressure1 0.63988 0.31683 2.020 0.04342 *
## feel_pressure2 1.14303 0.44113 2.591 0.00957 **
## friends_facebook1 0.07365 0.40218 0.183 0.85469
## check_child_profile1 -0.17095 0.37311 -0.458 0.64682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 302.98 on 222 degrees of freedom
## Residual deviance: 291.48 on 216 degrees of freedom
## AIC: 305.48
##
## Number of Fisher Scoring iterations: 4
## (Intercept) parent_connect_child1 age
## 2.8059157 1.1226306 0.8992518
## feel_pressure1 feel_pressure2 friends_facebook1
## 1.8962579 3.1362414 1.0764333
## check_child_profile1
## 0.8428603
The regression suggests that teens feeling pressure to only post content that makes them look good to others, were around 90% (“Yes, a little pressure”) to 214% (“Yes, a lot pressure”) more likely to post public affection towards his/her significant other. There is no significant statistical evidence to show that various teen’s parent’s behaviours (connecting with child on social media, being friends on Facebook, and checking child’s social media profile) have an effect on teen’s public affection behaviour.
This seems to suggest that teens are more influenced by their self-perception, rather than their parent’s interventions, with respect to whether or not they post public affection towards his/her significant other.
Question: Relationship between the parent using internet / social media themselves vs. them talking to their child about inappropriate online behavior.
Hypothesis: Parents are likely to be in a better position to advise their children on online behavior if they themselves use internet or are on social media
Background info - data cleaning and manipulation
For this question, the relevant dataset questions are:
[Dependent variable]:
parents_advice_score: This is a calculated column, which is basically the sum of P15_B and P15_C. It is a measure of how often a parent talks to his/her child about appropriate/inappropriate online behavior in general.
Takes values between 2 (lowest) and 8 (highest)
P15_B: How often do you talk with your child about what is appropriate or inappropriateto share online?
1.Never
2.Rarely
3.Occasionally
4.Frequently
P15_C: How often do you talk with your child about what is appropriate or inappropriatecontent for them to be viewing online?
1.Never
2.Rarely
3.Occasionally
4.Frequently
[Independent variables]:
P2_A: Do you ever use Facebook?
1.Yes
2.No
P2_B: Do you ever use Twitter?
1.Yes
2.No
P2_C: Do you ever access the internet on a cell phone, tablet or other mobile handheld device, at least occasionally?
1.Yes
2.No
P2_D: Do you ever use some other social media site?
1.Yes
2.No
Data cleaning
Invalid responses were removed by filling them with NA (e.g. if respondents were supposed to choose only options 1 or 2, but the data showed 3 or -1, these would be invalid responses).
The numeric coding was changed such that 0 represented a ‘No’ and 1 a ‘Yes’. E.g. if the question’s original order of choices that respondents could choose from were:
1.Yes, 2.No
The responses’ order were re-ordered to:
0.No, 1. Yes
For questions with options Yes/No/Does not apply, ‘Does not apply’ was treated as an NA.
For the calculated column, ‘parents_advice_score’, if one of the two variables being summed was missing, the other was multiplied by 2 to get the parents_advice_score. If both of the variables had missing values in a row, ‘parents_advice_score’ was also treated as a missing value.
## After data cleaning, the number of valid data records for analysis is: 1070
First, the correlations of the variables are looked at. Although Pearson’s
## parents_advice_score use_facebook use_twitter use_other_social_media
## [1,] 1 0.04818765 0.03943719 0.04881136
## use_internet
## [1,] 0.07040695
Correlation tests were run on each of these and only the correlation of use_internet came out to be significantly different from 0 at 5% confidence level. Its results are shown below:
##
## Pearson's product-moment correlation
##
## data: local_data4$parents_advice_score and local_data4$use_internet
## t = 2.3099, df = 1071, p-value = 0.02108
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.0106054 0.1297067
## sample estimates:
## cor
## 0.07040695
Plotting
Mean test for gender:
##
## Welch Two Sample t-test
##
## data: local_data4$parents_advice_score[local_data4$gender == "Male"] and local_data4$parents_advice_score[local_data4$gender == "Female"]
## t = -1.5124, df = 1071.5, p-value = 0.06536
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 0.01302965
## sample estimates:
## mean of x mean of y
## 6.243902 6.391144
Mean test for age:
##
## Welch Two Sample t-test
##
## data: local_data4$parents_advice_score[local_data4$age == 13 | local_data4$age == and local_data4$parents_advice_score[local_data4$age == 16 | local_data4$age == 14 | local_data4$age == 15] and 17]
## t = 4.2845, df = 891.97, p-value = 1.015e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.2625846 Inf
## sample estimates:
## mean of x mean of y
## 6.489130 6.062645
Divided into two clusters for testing whether there is a difference in means. 13-15 and 16-17.
Initial Regression Model:
##
## Call:
## lm(formula = parents_advice_score ~ use_facebook + use_twitter +
## use_internet + use_other_social_media + age + gender, data = local_data4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7443 -0.6870 0.0081 1.4142 2.4087
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.32264 0.51751 16.082 < 2e-16 ***
## use_facebookYes 0.13986 0.11189 1.250 0.2116
## use_twitterYes 0.05655 0.13082 0.432 0.6656
## use_internetYes 0.24929 0.12916 1.930 0.0539 .
## use_other_social_mediaYes 0.07309 0.11542 0.633 0.5267
## age -0.15134 0.03365 -4.498 7.62e-06 ***
## genderMale -0.15854 0.09640 -1.645 0.1003
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.578 on 1065 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.02801, Adjusted R-squared: 0.02254
## F-statistic: 5.115 on 6 and 1065 DF, p-value: 3.424e-05
Final Model:
##
## Call:
## lm(formula = parents_advice_score ~ use_facebook + use_internet +
## age + gender, data = local_data4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.791 -0.640 -0.028 1.373 2.411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.32154 0.51703 16.095 < 2e-16 ***
## use_facebookYes 0.16478 0.10860 1.517 0.1295
## use_internetYes 0.27418 0.12643 2.169 0.0303 *
## age -0.15147 0.03362 -4.505 7.38e-06 ***
## genderMale -0.15752 0.09634 -1.635 0.1023
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.577 on 1067 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.02718, Adjusted R-squared: 0.02353
## F-statistic: 7.452 on 4 and 1067 DF, p-value: 6.406e-06
Only use_internet and age are significant at 5%. Parents talk less as children grow older.
Facebook usage and Gender are close to being significant at 10% level. More data might make these two variables significant in this model as well.
Question: Relationship between stalking behaviour of parent to child and stalking behaviour of child to his/her boyfriend/girlfriend.
Hypothesis: If parents stalk teen, teen is more likely to inherit the stalking behaviour and stalk his/her boyfriend/girlfriend
Background info - data cleaning and manipulation
For this question, the relevant dataset questions are:
[Dependent variables]:
KR13_C: Have you ever done any of the following to someone you were dating or used to date. Accessed their mobile phone or online accounts
1.Yes
2.No
KR13_F: Have you ever done any of the following to someone you were dating or used to date. Downloaded a GPS or tracking program to their cell phone without them knowing
1.Yes
2.No
[Independent variables]:
P13_B: Have you ever checked which websites your child visited?
1.Yes
2.No
3.Does Not Apply
P14_G: Have you ever used monitoring tools to track your child’s location with his/her cell phone?
1.Yes
2.No
3.Does Not Apply
P14_H: Have you ever looked at the phone call records or messages on your child’s phone?
1.Yes
2.No
3.Does Not Apply
Similar data cleaning and manipulation operations were performed to get rid of invalid responses and to re-order values according from least to most.
## after data cleaning, number of valid data records for analysis is: 317
Findings
##
## Call:
## glm(formula = access_date_phone ~ check_child_website + monitor_child_location +
## look_child_phone_records + gender + age, family = "binomial",
## data = na.omit(data_specific))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8697 -0.4799 -0.3788 -0.3060 2.5490
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.9877 2.4699 -2.019 0.0434 *
## check_child_website1 0.2666 0.4489 0.594 0.5525
## monitor_child_location1 1.0593 0.4148 2.554 0.0107 *
## look_child_phone_records1 0.2878 0.4626 0.622 0.5339
## genderFemale 0.6471 0.3979 1.626 0.1039
## age 0.1147 0.1521 0.754 0.4509
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 207.42 on 316 degrees of freedom
## Residual deviance: 193.82 on 311 degrees of freedom
## AIC: 205.82
##
## Number of Fisher Scoring iterations: 5
## (Intercept) check_child_website1
## 0.006821593 1.305573261
## monitor_child_location1 look_child_phone_records1
## 2.884326470 1.333468483
## genderFemale age
## 1.910007714 1.121517752
It seems that if parent monitors their child’s location (parent’s stalking behaviour), teens are around 190% more likely (at 90% confidence level) to access the phone of someone they were dating or used to date (child’s stalking behaviour).
##
## Call:
## glm(formula = track_date_loc ~ check_child_website + monitor_child_location +
## look_child_phone_records + gender + age, family = "binomial",
## data = na.omit(data_specific))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0646 -0.2186 -0.1659 -0.1138 3.0206
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.1906 4.1111 1.019 0.3080
## check_child_website1 -0.5984 0.8501 -0.704 0.4814
## monitor_child_location1 1.8977 0.7771 2.442 0.0146 *
## look_child_phone_records1 -0.7579 0.8398 -0.902 0.3668
## genderFemale 0.4984 0.7610 0.655 0.5125
## age -0.5275 0.2713 -1.945 0.0518 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 74.668 on 316 degrees of freedom
## Residual deviance: 64.901 on 311 degrees of freedom
## AIC: 76.901
##
## Number of Fisher Scoring iterations: 7
## (Intercept) check_child_website1
## 66.0622783 0.5496711
## monitor_child_location1 look_child_phone_records1
## 6.6704953 0.4686308
## genderFemale age
## 1.6461638 0.5900521
It seems that if parent monitors their child’s location (parent’s stalking behaviour), teens are around 567% more likely (at 90% confidence level) to track the location of someone they were dating or used to date (child’s stalking behaviour). It is interesting to note that the coefficient for this is much higher than the previous child stalking behaviour (accessing phone of significant other). This seems to indicate that a parent’s behaviour would more strongly correlate to child’s similar behaviour (both tracking location).
In both cases, parent monitoring child’s location seemed correlated with both child’s stalking behaviours (accessing phone and tracking GPS location).
Given that only one of the three parent’s stalking behaviour was statistically significally, a possible alternate hypothesis would be that teen’s stalking behaviour was influenced more by his/her own characteristics. Various other questions regarding the teen was explored, and the associated hypothesis (i.e. why these questions were studied), are as listed below.
KR8: How frequently you expect to hear from your boyfriend/girlfriend/significant other in some way? 1.Hourly,2.Every few hours,3.Once a day,4.A few times a week,5.Once a week,6.Less often
KR3_A: Have you ever searched for information online about someone you were currently dating or were interested in dating? 1(YES)-2(NO)
KR3_C: Have you ever searched for information online about someone you dated or hooked up with in the past? 1(YES)-2(NO)
KF15: Have you ever had a fight with any of your friends that started because of something that happened online or because of a text? 1(YES)-2(NO)
KFSNS1_B: In general, does social media make you feel worse about your own life because of what you see from other friends on social media? 1.Yes, a lot,2.Yes, a little,3.No
KFSNS1_E: In general, does social media make you feel pressure to only post content that makes you look good to others? 1.Yes, a lot,2.Yes, a little,3.No
##
## Call:
## glm(formula = track_date_loc ~ check_child_website + monitor_child_location +
## look_child_phone_records + feel_worse + gender + age, family = "binomial",
## data = na.omit(data_specific))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4557 -0.1925 -0.1552 -0.1155 3.0441
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.6636 4.6562 0.572 0.5673
## check_child_website1 -0.6114 0.9956 -0.614 0.5391
## monitor_child_location1 0.8966 0.9302 0.964 0.3351
## look_child_phone_records1 -0.8473 0.9826 -0.862 0.3885
## feel_worse1 0.3441 1.2156 0.283 0.7771
## feel_worse2 2.4532 1.0162 2.414 0.0158 *
## genderFemale 0.0326 0.8686 0.038 0.9701
## age -0.4163 0.2988 -1.393 0.1635
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 65.315 on 276 degrees of freedom
## Residual deviance: 51.912 on 269 degrees of freedom
## AIC: 67.912
##
## Number of Fisher Scoring iterations: 7
## (Intercept) check_child_website1
## 14.3484541 0.5425950
## monitor_child_location1 look_child_phone_records1
## 2.4512708 0.4285541
## feel_worse1 feel_worse2
## 1.4107123 11.6257104
## genderFemale age
## 1.0331405 0.6594689
After testing for these various questions, only the “feel worse” question was statistically significant. It is interesting to see that feeling a lot worse (feel_worse_2) had a much higher coefficient. So perhaps if one had lower self esteem, he/she would be more likely to stalk his/her significant other. It is also interesting to note that adding the feel_worse variable made the parent monitor_child_location variable no longer statistically significant.
Question: How more likely are children to trust their significant others and not have the urge to constantly monitor their activities if their parents are more trusting of them?
Hypothesis: Children are likely to have healthier relationships and not have insecurities regarding their significant others if they are trusted by their parents
Background info - data cleaning and manipulation
For this question, the relevant dataset questions are:
[Dependent variable]:
KRSNS3_A: When you use social media do you ever keep track of where your boyfriend, girlfriend or significant other is or what they are doing? 1.Yes
2.No
[Independent variables]:
P13_A: Have you ever used parental controls or other technological means of blocking, filtering or monitoring your child’s online activities?
1.Yes
2.No
3.Does Not Apply
P13_B: Have you ever checked which websites your child visited?
1.Yes
2.No
3.Does Not Apply
P14_G: Have you ever used monitoring tools to track your child’s location with his/her cell phone?
1.Yes
2.No
3.Does Not Apply
P14_H: Have you ever looked at the phone call records or messages on your child’s phone?
1.Yes
2.No
3.Does Not Apply
Data cleaning
Similar data cleaning and manipulation operations were performed to get rid of invalid responses and to re-order values according from least to most.
## After data cleaning, the number of valid data records for analysis is: 279
Correlation Matrix:
## kid_tracking_significant_others_activity used_parental_controls
## [1,] 1 0.0583896
## checked_websites_visited tracked_childs_location
## [1,] 0.05163415 0.1549288
## tracked_calls_and_messages
## [1,] 0.05825054
Correlation tests were run to see if there is significant statistical evidence that the correlation is different from 0 for any of these variables. Only tracked_childs_location came out to be different from 0 and its results are shown below:
##
## Pearson's product-moment correlation
##
## data: local_data5$kid_tracking_significant_others_activity and local_data5$tracked_childs_location
## t = 2.6614, df = 288, p-value = 0.008219
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04047146 0.26537295
## sample estimates:
## cor
## 0.1549288
Plotting y
Plotting x
Prop test:
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: table(local_data5$kid_tracking_significant_others_activity, local_data5$gender)
## X-squared = 5.4052, df = 1, p-value = 0.01004
## alternative hypothesis: less
## 95 percent confidence interval:
## -1.00000000 -0.04337236
## sample estimates:
## prop 1 prop 2
## 0.4495413 0.6000000
Reveals that males are less likely to keep track of their significant other on social media. which might go against one’s general perception, which usually is that females tend to trust their partners more. The behavior on social mediaappears to be otherwise howevver. It may also mean however that females tend to give more importance to their relationships than their counterparts in general, whereas men may give more time to other things n social media.
Regression:
##
## Call:
## glm(formula = kid_tracking_significant_others_activity ~ checked_websites_visited +
## tracked_childs_location + age + gender, family = "binomial",
## data = local_data5)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2701 -0.8479 -0.6941 1.1447 1.9344
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.00990 1.65517 -1.214 0.22463
## checked_websites_visitedYes 0.26521 0.28324 0.936 0.34909
## tracked_childs_locationYes 0.78818 0.30482 2.586 0.00972 **
## age 0.06893 0.10325 0.668 0.50434
## genderMale -0.58996 0.26565 -2.221 0.02637 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 351.89 on 288 degrees of freedom
## Residual deviance: 337.30 on 284 degrees of freedom
## (792 observations deleted due to missingness)
## AIC: 347.3
##
## Number of Fisher Scoring iterations: 4
## (Intercept) checked_websites_visitedYes
## 0.1340024 1.3037090
## tracked_childs_locationYes age
## 2.1993808 1.0713654
## genderMale
## 0.5543491
tracked child location is coming out significant at 1% and gender at 5%.
However, it is important to note here that The number of people who use monitoring tools to track their child’s location is not particularly big in this survey, as compared to the other ways used by parents to track their child. So further research would be needed to see if the effect holds in relatively larger samples.
The analysis was limited to dependent variables with binary (Yes/No) options. Ideally, dependent variables with more than 2 options should also be considered. A few interesting hypotheses were considered in that respect as well for further study. However, since that would have required an implementation of ordinal logistic regression, perhaps using the polr function in the MASS library, the analysis was kept to these relatively simpler models in the interest of time. With additional time, it would have been interesting to investigate and understand this further.
Also, since there were a lot of missing values in the dataset, some of the models were based on a relatively small sample. With a larger number of complete observations, it might have been possible to get more interesting relationships that would come out to be statistically significant, which might not have been apparent in this relatively small number of complete responses.
A more robust analysis on the trends of the effect of parents’ behaviours on teen’s behaviours with regard to social media usage (e.g. stalking behaviour, defiance / urge to be liberalised etc) could be conducted if there was more data. Perhaps more questions could be asked to parent and teen in this aspect, to further tease out these possible relationships between parent and teen. More historical data, not just for one year, could be collated and analysed to discover if there were any trends, or whether these parent-child relationships were consistent over time, for 13-17 year old teenage groups across cohorts. It would also be interesting if the survey was conducted for older children, beyond 17 year olds, to test if certain hypotheses or relationships between parent and child persist beyond these ages. For example, if data suggest that parent’s stalking behaviour positively correlate with teen’s stalking behaviour, it would be interesting to investigate whether the teen’s stalking behaviour persists even when he/she is, say, 25 years old. If so, it would suggest that parents should be educated to let them realise what appropriate interventions they should be making, and sometimes well-intended interventions, if executed in the wrong way, could perhaps lead to more permanent adverse impact on the teen.
Regarding the analyses used in this section, the approach was rather simplistic where, for instance, not much attention was paid to the behavior of the residuals in the models to ascertain whether they were fairly normal or not. Ideally, this should not be the case and should be looked into to make sure any important variables are not being missed out, which might have led to some biased coefficient estimates in the analysis. Also, when more ordered factor or continuous variables are included to test further relationships, perhaps some transformations of the variables can lead to better fitted models.
In addition, since many of the variables in the dataset have binary responses, a phi coefficient (from psych library) can be a better measure of looking at the interdependence of different variables in the initial testing phase, instead of simply using Pearson’s correlations, which are primarily suited for continuous variables.
Finally, as mentioned earlier, getting a larger set of complete responses by increasing the initial sample size and the use of ordinal logistic regression for exploring some additional relationships can lead to the identification of some more interesting patterns and relationships in the dataset.
The dataset had a number of questions there were posed to the teen about their close friends and relationships with their significant others.
A number of hypotheses were investigated using this dataset, to see if there were any relationships between teen’s behaviours and their relationship with close friends / significant others.
For example, teens were asked about the amount of time they spend with their closest friends (either online or face-to-face) and what social media accounts do they own. This was an opportunity to investigate if social media accounts have an effect on the amount of time teenagers spend with their friends.
Effects of Social Media to Teens Relation with Friends and Girlfriends/Boyfriends
The table below summarises the questions being studied, the initial hypothesis, and findings drew from the data.
| Question | Hypothesis | Findings |
|---|---|---|
| 1. Number of friends and followers on social media varies by gender | Female teens seem more attached to social media rather than male, thus number of friends and followers of female users might be higher than male | On average, female teens has more friends on Facebook and more followers on Instagram than male teens. Number of social media accounts that female teens use is also slightly higher than male. |
| 2. Relationship between number of social media account teens have and number of electronic device teens possess with how much time they spend with their close friends | The more social media accounts/electronic devices a teen has then the more time he/she spend with their close friends | Test shows a positive correlation between the two variables. Teens tend to spend more time with their close friends - either face-to-face, by phone, or any other media - when they have more social media accounts or electronic devices. This might indicate that social media and electronic devices help teens to communicate or get along with their friends. |
| 3. Relationship between teens online dating experiences with their perception about other people’s image in social media | Teens who think that people tend to show different side of themselves in the social media would be less likely to experience online dating | Results show that there is a correlation between those two variables. One might think that people would be less likely to do online dating when they aware that other people is showing a different side of themselves. However, teens in fact do online dating even though they aware that other people might not be his/herself but yet never met the girlfriend/boyfriend in person. |
| 4. Does teens send flirtatious messages or flirty picture or videos when they are attracted to someone? | Teens in age 13-17 will less likely to send flirty messages, pictures, or videos to someone they find attractive | It turned out that teens with african american background and japanese background have a positive tendency to send flirtatious messages, pictures, or videos to show their interest to somebody else |
| 5. How would teens react when they break up with their girlfriend/boyfriend? Will they do such thing as block or unfriend ex in social media or even remove ex from phone address book? | Younger teenagers might do such things when they break up but this behavior will diminish as they get older | Result shows that female teens tend to unfriend/block her ex than male teens. It also can be seen that the older the teens then the more likely he/she to remove her ex from phone address book. This is a contrast of what one would expect to see before testing. |
Hypothesis: Female teens seem more attached to social media rather than male, thus number of friends and followers of female users might be higher than male
Background info
For this question, the relevant dataset questions are:
K6_1: Which of the following social media do you use? Facebook?
K6_2: Which of the following social media do you use? Twitter?
K6_3: Which of the following social media do you use? Instagram?
K6_4: Which of the following social media do you use? Google+?
K6_5: Which of the following social media do you use? Snapchat?
K6_6: Which of the following social media do you use? Vine?
K6_7: Which of the following social media do you use? Tumbler?
The options respondents had were:
1. Yes
2. No
Number of social media accounts a person has can be calculated by summing up all “Yes” answer
KFB1A: How many friends do you have on Facebook? The options respondents had were ranging from 0 to 9999
KFB1B: How many followers do you have on Instagram? The options respondents had were ranging from 0 to 9999 KFB1C: How many followers do you have on Instagram? The options respondents had were ranging from 0 to 9999
Child_gender: Is your child (the user of social media) male or female? The options respondents had were:
1. Male
2. Female
Findings
Social media behavior may vary by gender. In general, female is assumed to be more active in social media rather than male. In term of friends or followers in social media, female indeed has more friends or followers compared to male.
Data from survey out of 249 female and 231 male shows that number of Facebook friends of female users is considerabably higher than number of Facebook friends that male users have.
Female on average has 304 friends on Facebook while male has 221 friends. The median is also slightly lower for male, which is only 100 friends while female has 56 more friends. It can be seen from the graph that female has wider range of Facebook friends, with maximum number of friends 5000. This must be because Facebook policy that limit friends of Facebook to a maximum of 5000. In this case, it is assumed that user will only have one Facebook account and will not create a new one for having friends more than 5000. From the t test it can be concluded that mean of Facebook friends of female teen users is higher than male teen users as the p-value is 0.03822 in 95% confidence interval.
Besides Facebook, there is another social media account that is currently popular among teenagers: Instagram. Data from survey out of 206 female and 112 male who uses Instagram shows that number of Instagram followers of female users is also higher than number of Instagram followers that male users have.
This can be counted through survey questions that indicate the gender of social media user and the number of Instagram followers that particular user has.
On average, female has more followers with a number of 417 compared to male with only 280 followers in Instagram. And again, female has a wider range of quartile than male. The lower quantile of female is 68 followers and the upper is 450 while male has a lower quantile with only 34 followers and the upper is around 200 followers lower than female’s.
Instagram is slightly different than Facebook. Using Instagram let you have followers rather than friends. This means that user A can follow user B even though B does not follow A back. On Facebook, friends are the other users that are recognized or accepted as your friends. So if A is a friend of B then B must be a friend of A. This might explain why on average, number Instagram followers is higher than number of Facebook friends for both female and male users.
| Average Number of Friends/Followers | ||
|---|---|---|
| Female User | 304 | 417 |
| Male User | 221 | 280 |
Plot below shows that most female has up to four social media accounts and male up to three accounts. The median for female is also higher than male as female has three social media account while male has two. This might explain why female tends to have more friends and followers in social media. People usually put link of all social medias that she/he has in the profile, for example, put link of his/her Snapchat account in Instagram profile. Thus the more social media accounts a person has, the more likely a person will be recognized online.
## [1] 2.992495
## [1] 2.248077
Number of Facebook friends that a teen has can be predicted using number of followers on Instagram and Twitter. Regression result shows that this regression formula has an R-squared of 0.3396 which means that the variation in number of Facebook friends can be explained as much as 33.9% by having the number of Instagram and Twitter followers.
##
## Call:
## lm(formula = FBFriends ~ IGFol + TwitterFol, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1277.67 -129.26 -82.67 89.06 2140.69
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 119.75637 25.37908 4.719 4.17e-06 ***
## IGFol 0.57449 0.06642 8.649 9.93e-16 ***
## TwitterFol 0.32955 0.10372 3.177 0.0017 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 356.4 on 225 degrees of freedom
## (853 observations deleted due to missingness)
## Multiple R-squared: 0.3454, Adjusted R-squared: 0.3396
## F-statistic: 59.36 on 2 and 225 DF, p-value: < 2.2e-16
This regression could be improved by having another continuous variables that might correlated with the number of Facebook friends in the next survey such as how much time spent on Facebook, how many friends he/she has in real life e.g. number of school mates, et cetera.
Hypothesis: The more social media accounts/electronic devices a teen has then the more time he/she spend with their close friends
Background info
For this question, the relevant dataset questions are:
KF12: Now, thinking again about friends, please think about the friend you are closest to someone you can talk to about things that are really important to you, but who is not a boyfriend or girlfriend. How often are you in touch with this person? The options respondents had were:
1. Many times a day
2. Once a day
3. A few times a week
4. Once a week
5. Once every few weeks
6. Less often
7. Do not have a close friend
In order to ease interpretation, it is better to convert the answer scale of the communication frequency question - 1 for the least and 7 for the most frequent. After converting the scale would change into the following:
1. Do not have a close friend
2. Less often
3. Once every few weeks
4. Once a week
5. A few times a week
6. Once a day
7. Many times a day
K6_1: Which of the following social media do you use? Facebook?
K6_2: Which of the following social media do you use? Twitter?
K6_3: Which of the following social media do you use? Instagram?
K6_4: Which of the following social media do you use? Google+?
K6_5: Which of the following social media do you use? Snapchat?
K6_6: Which of the following social media do you use? Vine?
K6_7: Which of the following social media do you use? Tumbler?
The options respondents had were:
1. Yes
2. No
Number of social media accounts a person has can be calculated by summing up all “Yes” answer
K3_A: Do you have a smartphone?
K3_B: Do you have a cell phone that is not a smartphone?
K3_C: Do you have a desktop or laptop computer?
K3_D: Do you have a tablet computer like an iPad, Samsung Galaxy or Kindle Fire?
K3_E: Do you have a gaming console like an Xbox, PlayStation or Wii?
The options respondents had were yes or no.
Calculate total type of electronic devices that a person possesses by summing up all “Yes” answer. Store the total number in a new variable call “x”.
Findings
People frequently relates social media with friendship, particularly how it might affects time he/she spend with their close friends. People might be interested in this area since there were no social media in few years back and now it has fastly grown into so many kind of social medias such as Facebook, Snapchat, and Google+.
In the survey, social media users are asked regarding how much time he/she spent with their close friends. This includes face-to-face, on the phone, text messaging and all the other ways you might talk to this person.
Correlation between total number of social media accounts a person has can be calculated against how often a person interact with their close friends through any media.
##
## Pearson's product-moment correlation
##
## data: data_specific$y and data_specific$x
## t = 8.5435, df = 1045, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1979956 0.3112819
## sample estimates:
## cor
## 0.2555156
The correlation of those two variables is 0.256 and the p value from the correlation test is < 2.2e-16 which means that the null hypothesis that the correlation is equal to 0 can be rejected. The 95% confidence interval is between 0.20 to 0.31 which also shows that there exists a positive relationship between number of social media accounts the child has and how much time he/she spends with his close friends. Thus the more social media accounts a person has then the more time he/she spends with his close friends, either face-to-face or communicate using the social media.
Another correlation that might exists is relationship between number of electronic devices type someone has with time spent with their close friends. This includes face-to-face communication and any other way someone may interact with their close friends.
Test correlation between time child spent with close friends and number of electronic device type the child possesses:
##
## Pearson's product-moment correlation
##
## data: data_specific$y and data_specific$TotalDeviceType
## t = 4.5409, df = 1046, p-value = 6.253e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07914807 0.19792751
## sample estimates:
## cor
## 0.1390378
The correlation of those two variables is 0.146 and the p value from the correlation test is 1.812e-06 which means that the null hypothesis that the correlation is equal to 0 can be rejected. The 95% confidence interval is between 0.09 to 0.20 which also shows that there exists a positive relationship between number of electronic devices type the child has and how much time he/she spends with his close friends. This might be because electronic devices such as smartphone or laptop help you to communicate more intense with your close friends and because gaming consoles might become a way to spend time together with close friends.
This findings actually can be improved to be more precise. It would be better if survey split up the question of time spent with close friends into two questions: “How much time spent with close friends face-to-face (e.g. having lunch together, playing basketball, playing games)?” and “How much time spent with close friends through any other way (e.g. text messaging, Facebook chats, phone calls)?”
Hypothesis: Teens who think that people tend to show different side of themselves in the social media would be less likely to experience online dating
Background info
For this question, the relevant dataset questions are:
KFSNS3_A: Do you agree or disagree with each of the following statements? People get to show different sides of themselves on social media that they can’t show offline? The options respondents had were: 1. Strongly agree
2. Agree
3. Disagree
4. Strongy disagree
KR2: Have you ever had a boyfriend, girlfriend or significant other that you first met online, but never met in person? The options respondents had were:
1. Yes
2. No
Findings
Social media is not only capturing child’s relationship with friends but also his/her relationship with girlfriend/boyfriend. One that is interesting is their experiences finding boyfriend/girlfriend thorugh social media.
##
## Pearson's product-moment correlation
##
## data: data_specific$x1 and data_specific$y
## t = 2.4989, df = 81, p-value = 0.01448
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05501593 0.45685763
## sample estimates:
## cor
## 0.2675306
The correlation between those variables turned out to be positive. Teens who perceived that other people show different side of themselves that they can not show offline is more likely to experience having relationship with somebody they met online and never met in person. This mean that they find a person more interesting online and hence fall for that person, or maybe one is more able to present the more charming side and true self on social media instead of meeting in person.
Hypothesis: Teens in age 13-17 will less likely to send flirty messages, pictures, or videos to someone they find attractive
Background info
For this question, the relevant dataset questions are:
KDATE2_D: Have you ever done any of these things to let someone know you were attracted to them or interested in them? Have you sent them flirtatious messages? The options respondents had were:
1. Yes
2. No
KDATE2_G: Have you ever done any of these things to let someone know you were attracted to them or interested in them? Have you sent them sexy or flirty pictures or videos of yourself? The options respondents had were:
1. Yes
2. No
QS10_1-15: Please check one or more categories below to indicate what race(s) you consider yourself to be.
Child_gender: Is your child (the user of social media) male or female? The options respondents had were:
1. Male
2. Female
Findings
Correlation between sending flirty messages and black/african american teens:
##
## Pearson's product-moment correlation
##
## data: data_specific$x2 and data_specific$a
## t = -1.8264, df = 1051, p-value = 0.06808
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.11626379 0.00417964
## sample estimates:
## cor
## -0.05624671
Correlation between sending flirty messages and japanese teens:
##
## Pearson's product-moment correlation
##
## data: data_specific$x7 and data_specific$a
## t = -2.7905, df = 1051, p-value = 0.005359
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.14541621 -0.02547728
## sample estimates:
## cor
## -0.08575744
Correlation between sending flirty videos and japanese teens:
##
## Pearson's product-moment correlation
##
## data: data_specific$x7 and data_specific$b
## t = -2.2002, df = 1051, p-value = 0.02801
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.127602854 -0.007330528
## sample estimates:
## cor
## -0.06771269
The correlation of those two variables is -0.068 and the p value from the correlation test is 0.02801 which means that the null hypothesis that the correlation is equal to 0 can be rejected. The 95% confidence interval is between -0.127 to -0.028. Correlation test results show that there are negative correlations - except african americans: need a 10% level test. This means that japanese and black/african american were more likely to send flirtatious pictures, videos and messages among all other race such as white, asian indian, chinese, filipino, korean, and some other races.
Hypothesis: How would teens react when they break up with their girlfriend/boyfriend? Will they do such thing as block or unfriend ex in social media or even remove ex from phone address book?
Background info
For this question, the relevant dataset questions are:
KRSNS4_A: Have you ever unfriended or blocked someone that you used to be in a relationship with? The options respondents had were: 1. Yes
2. No
KRCELL_D: Have you ever removed someone that you used to be in a relationship with from your phone address book? The options respondents had were:
1. Yes
2. No
Child_gender: Is your child (the user of social media) male or female? The options respondents had were:
1. Male 2. Female
Child_age: How old is your child (user of social media)? The options respondents had were 13/14/15/16/17 years old.
Findings
Here is the graph showing responses of question “Have you ever unfriended or blocked someone that you used to be in a relationship with?” by gender:
It can be seen that male is more unlikely to unfriended or blocked an ex in social media. This might be because female teens are a little bit more emotional than male teens especially after breaking up.
Again it can be seen that male has less probability to removed his ex from phone address book rather than female towards her ex.
This is another response of question “Have you ever removed someone that you used to be in a relationship with from your phone address book?”:
As responses to this question might differ by teens age, graph below shows how each of age group (13/14/15/16/17) responded.
Graph shows that only a few of teens in age 13-14 that ever experienced removing ex from phone address book, while older teens seem more likely to remove ex. Using correlation test between this particular question against teens age reveals that there is a negative correlation between those variables.
##
## Pearson's product-moment correlation
##
## data: survey_data$Child_age and survey_data$KRCELL_D
## t = -2.0583, df = 341, p-value = 0.04032
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.214160262 -0.004939567
## sample estimates:
## cor
## -0.1107771
Pearson correlation test of those two variables is -0.11 and the p value from the correlation test is 0.04032 which means that the null hypothesis that the correlation is equal to 0 can be rejected. The 95% confidence interval is between -0.214 to -0.005. It shows that older teenagers will be more likely to remove ex from phone address book rather than younger teenagers.
For some investigations which were done in this section, after filtering out invalid / irrelevant data, a very small sample is left, hence making the results not as convincing (for example question 4 on Japanese and African Americans. A much larger dataset is needed in order to provide larger sample after filtering to give strong evidence to the conclusions.
The findings are summarised as follows:
On average, female teens has more friends on Facebook and more followers on Instagram than male teens. Number of social media accounts that female teens use is also slightly higher than male.
Test shows a positive correlation between Teens spending more time with their close friends - either face-to-face, by phone, or any other media - and having more social media accounts or electronic devices. This might indicate that social media and electronic devices help teens to communicate or get along with their friends.
One might think that people would be less likely to do online dating when they aware that other people is showing a different side of themselves. Perhaps it is this different side on the internet that the other presents which caused one to fall in love with.
It turned out that teens with African American background and Japanese background have a positive tendency to send flirtatious messages, pictures, or videos to show their interest to somebody else!
Result shows that female teens tend to unfriend/block her ex more than male teens. It also can be seen that the older the teens the more likely he/she is to remove her ex from phone address book.
In general the dataset offers a lot of questions focusing on the use, general behaviour and perception of social media by parents and their kids between the age of 13 and 17 years.
The previous research was mainly focusing on quite “straight forward” approaches, leading to already interesting results. Therefore the following will focus on a more descriptive and deep dive approach.
Social media like Facebook or Instagram provide a platform that enables people to engage and participate in social activities in a way that is close to the normal social spectrum and furthermore increase the reach of every person to the limits of number of users.
This imposes new challenges for each individual, like defining a close network of peers, differentiating between online and real world relationships - if there is any difference. But one of the most interesting questions is, “How does social media change the way one perceives him/herself, enabling benchmarking against more than 1bn different people, compared to the wider circle of acquaintances 15 years ago?”
According to Huffington Post Article there has been research conducted how Facebook has significant impact on one’s feelings as well as how one’s personality is reflected in certain behavioural patterns on social media. (Article)
Given this, it seems logical to infer that the higher the exposure to social media, the higher the impact on feelings and in consequences on self-perception. Defining a variable to measure “exposure” especially not just expressed as hours spent on Facebook, but also how emotionally people might be engaged to their own social media network, seems to be tricky. Generally the quality or value added of all networks is strongly relying on members of the network, in this case this would be number of friends a person has on a certain social media platform.
The basic assumption therefore is, that the number of friends should have a significant impact on the perceived value added of social media for each individual, therefore should have a positively correlating impact on exposure (the higher perceived value, the more time online and therefore the more exposure) and based on the assumptions stated in the article, this should have an impact on feelings and self-perception.
Hypothesis: “The number of friends on Facebook resp. followers on Instagram, has a significant impact on the perceived value added of social media and consequently about the own life”.
This hypothesis was amended due to findings during the analysis for the following reason. The set of 5 questions, at least in part require a certain capability of self-reflection, emotional intelligence and general maturity. 1. In general, does social media make you feel more connected to information about what’s going on in your friend’s lives? 2. In general, does social media make you feel worse about your own life because of what you see from other friends on social media? 3. In general, does social media make you feel better connected to your friends feelings? 4. In general, does social media make you feel pressure to post content that will be popular and get lots of comments or likes? 5. In general, does social media make you feel pressure to only post content that makes you look good to others?
Especially question 4 and 5 support this assumption. Therefore the hypothesis was amended to the following.
Hypothesis: “The age and the number of friends on Facebook resp. followers on Instagram, have a significant impact on the perceived value added of social media and consequently about the own life”.
Given the holistic nature of the question the following description will predominantly focus on the approach and the rational behind certain decisions.
Prior to the explanation of the initial approach, the following steps need to be carried out.
Besides the standardised global cleaning, which was done centrally for the purpose of the following analysis, there needs to be done further cleaning for the following reason:
The survey assumes different media channels as “social media”. With regards to the hypothesis not all considered media channel by the survey are accurate or applicable e.g. WhatsApp is predominantly a peer-to-peer instant messenger service and doesn’t fulfil the criteria of social exposure within the network. Therefore, in the following the Facebook and Instagram are summarised as social media.
Besides cleaning out not valid responses to all examined questions, all respondents have to be removed that are neither using Facebook nor Instagram. This means that all responses that are considered to be valid for this analysis must have cumulated valid responses to the research questions and have to be at least present in one of the networks.
This cleaning is crucial for the validity of later results, but unfortunately decreases the number of responses from initially 1081 to a final of 755 . In general the comparably small number of responses in the survey, needs to be kept in mind at all times.
The cleaned data set is now ready to be clustered. The selection of Facebook and Instagram as representative networks follows the rational of reciprocity, size, primary way of communication as well as popularity.
Reciprocity: Reciprocal: The connection between members of Facebook (friends) are exclusively reciprocal and need to be requested/approved individually. Therefore there is no mismatch between people be able accessing own vs. accessing other peoples information/posts.
Non-reciprocal: Instagram and Twitter offer non-reciprocal relationships, that can lead to significant mismatches between the information streams. Extreme examples of these patterns can be seen with celebrity profiles, sometime following 200 profiles, but being followed by several million people.
Size: Facebook as well as Instagram are both extremely popular and established networks. Facebook, as the game changer in the early 2000s and Instagram as the innovator, predominantly moving communication from text to picture based. Based on the non-reciprocal character and picture-based communication, Instagram is extremely interesting with regards to exposure and subtle messages, pictures are more capable of including that written messages.
Based on this logic, every respondent has an assigned 33% tercile with 3 being highest 1 lowest, based on the individual number of friends for Facebook and follower on Instagram.
This seems to be in line with the hypothesis, since it can be concluded that respondents with a comparably higher number of friends/followers have comparably higher exposure to the network and therefore might have different views on certain things, like perception of other or pressure to post content.
To ensure the comparability of the different subsets clustered by tercile, limited analysis was conducted based on frequency of usage. The following assumptions were done:
People with an account in both networks can be considered to be the most exposed individuals in the set. Therefore the variance of logins within this subset was compared to the variance of logins in the entire data set among the different clusters. Due to the “either or”" decision with regards to representation in the networks, the correlation between belonging to same cluster in the two different networks could only be conducted in the subset.
There is a moderate correlation with: 0.4795945
The frequency of usage was measured by question KFR11_H
How often do you spend time with friends posting on social media sites? 1.Every day, 2.Every few days, 3. Less often, 4. Never
The variance of logins in the set is 0.7885608, in the subset 0.7797496, which leaves to the conclusion that logins of respondents with no clustering by tercile vary equally among the different sets.
As shown below, the variance of logins varies stronger over the different clusters in the set compared to the subset, while Instagram has almost equally variation of logins.
| Facebook_Sub | Facebook_Total | Instagram_Sub | Instagram_Total |
|---|---|---|---|
| 0.7953976 | 0.8339709 | 0.7601583 | 0.7930846 |
| 0.7867325 | 0.7446632 | 0.7586703 | 0.7631275 |
| 0.7668948 | 0.8250320 | 0.8330001 | 0.7353005 |
As initially mentioned many conclusions relevant for the final approach, evolved during the first approach of analysing the data. After cleaning the data and clustering it, a visual analysis for assumed patterns was conducted with so called heat maps. As mentioned the observed correlation based on the clustering would have been expected to be much clearer, even though some weak patterns, proving the initial thought could be observed, like respondents from cohort 3 feel in general more connected to information than others. Furthermore most of the respondents stating that they “feel worse about their lives” considered themselves to be highly connected to information.
How do responses vary among the clusters by tercile?
How do questions access to information and perceived of own life correlate?
Conclusion:
The heat maps do neither indicate any significant relationships between the number of friends on Facebook, nor the perceived mood (feeling bad about live) with perceived value added of the network (feel connected to information).
Especially the “Yes a little” received comparably high results, which might be due to the “neither nor” character of the answer. Based on these observations, the following two conclusions were drawn:
KFSNS1_B: In general, does social media make you feel worse about your own life because of what you see from other friends on social media? KFSNS1_D: In general, does social media make you feel pressure to post content that will be popular and get lots of comments or likes? KFSNS1_E: In general, does social media make you feel pressure to only post content that makes you look good to others?
receive comparably high negative or indifferent responses. Given the initial thoughts, this might be due to the age of the participants and there mentioned emotional capabilities. To evaluate whether the age has an impact on the way of responding to the dataset was re-clustered by age.
The following results are solely clustered by age, regardless of the tercile for the networks. This was done to look at the impact of age on the results in an isolated way.
The table shows all correlation for all different unique combination of questions per age cohort. It can be observed that the correlation for the same combination of questions slightly vary among the different cohorts. With 130 ,147 ,136 ,172 and 170 the respondents are more or less evenly distributed among the different ages (increasing order starting with 13). Therefore the observation can be assumed to be valid, with not too many variations due to the different sample sizes.
| Cohort 13 | Cohort 14 | Cohort 15 | Cohort 16 | Cohort 17 | Cohort Total | |
|---|---|---|---|---|---|---|
| KFSNS1_A / KFSNS1_B | 0.2794975 | 0.2565006 | 0.1945774 | 0.1091057 | 0.2666693 | 0.2202745 |
| KFSNS1_A / KFSNS1_C | 0.5247100 | 0.5435872 | 0.4920945 | 0.5780586 | 0.4833469 | 0.5256724 |
| KFSNS1_A / KFSNS1_D | 0.2848084 | 0.2753407 | 0.3173093 | 0.2768121 | 0.4298434 | 0.3209615 |
| KFSNS1_A / KFSNS1_E | 0.3025980 | 0.2824830 | 0.3018576 | 0.2467113 | 0.3577742 | 0.2948440 |
| KFSNS1_B / KFSNS1_C | 0.2445661 | 0.2325581 | 0.0791981 | 0.0328683 | 0.2571253 | 0.1731575 |
| KFSNS1_B / KFSNS1_D | 0.5199121 | 0.4502472 | 0.2853243 | 0.3764469 | 0.4975925 | 0.4313815 |
| KFSNS1_B / KFSNS1_E | 0.5576299 | 0.4343617 | 0.4813171 | 0.3555698 | 0.5207501 | 0.4678951 |
| KFSNS1_C / KFSNS1_D | 0.2558098 | 0.2876513 | 0.1682519 | 0.3894248 | 0.3299875 | 0.2963760 |
| KFSNS1_C / KFSNS1_E | 0.2793423 | 0.3345391 | 0.1271861 | 0.2699642 | 0.3071706 | 0.2694057 |
| KFSNS1_D / KFSNS1_E | 0.8115707 | 0.6504855 | 0.6714623 | 0.5872474 | 0.6274787 | 0.6657476 |
The plot confirms the previous observation of varying correlations over the different cohorts for the same pair of question. Looking at question pair KFSNS1_D / KFSNS1_E within cohort 13, its interesting to observe that there is a outstanding high correlation for this pair and only in this cohort(displayed in table above). This might be due to two contradictive assumptions. Questions D as well as E are aiming almost in the exact same direction but slightly differently phrased. Therefore the two possibilities are that either the difference between the questions was not correctly perceived, or given the case that E is a control question for D, 13 year olds are more honest and responding the same way to equal questions.
Following the same logic as above, the analysis of correlations between different questions sets will be done based on a new clustering. The respondents are now clustered by their age, tercile of friends/followers and their responses split up by the two networks Facebook and Instagram.
The number of observations in each cluster is important with regards to the reliability of further conclusions based on this data set.
Occurrences: Displays the number of all respondents that have an account in the corresponding network and cluster
tercile: Value 1 indicates that this cohort belongs to the lowest 10% quantile and therefore shouldn’t be considered due to the small sample size
| Cohort | Occurences_FB | Occurences_IG | tercile_Group_FB | tercile_Group_FB |
|---|---|---|---|---|
| 13 / 1 | 36 | 22 | 1 | 1 |
| 13 / 2 | 39 | 29 | 2 | 2 |
| 13 / 3 | 19 | 36 | 1 | 2 |
| 14 / 1 | 40 | 30 | 2 | 2 |
| 14 / 2 | 47 | 24 | 2 | 1 |
| 14 / 3 | 40 | 33 | 2 | 2 |
| 15 / 1 | 42 | 38 | 2 | 2 |
| 15 / 2 | 41 | 29 | 2 | 2 |
| 15 / 3 | 40 | 26 | 2 | 2 |
| 16 / 1 | 58 | 58 | 2 | 2 |
| 16 / 2 | 39 | 24 | 2 | 1 |
| 16 / 3 | 59 | 35 | 2 | 2 |
| 17 / 1 | 52 | 44 | 2 | 2 |
| 17 / 2 | 43 | 26 | 2 | 2 |
| 17 / 3 | 66 | 37 | 2 | 2 |
| Total / 1 | 228 | 192 | 2 | 2 |
| Total / 2 | 209 | 132 | 2 | 2 |
| Total / 3 | 224 | 167 | 2 | 2 |
Observations:
Highest correlation: The highest correlations can be observed for questions (A/C, B/E, D/E, B/C). Starting with the already mentioned unique combination of D/E, which are closely related questions the following can be concluded from the plot. Looking at all counter examples, so same first question matched with D and E, it becomes obvious that those questions with exception of (B/D, B/E) are more or less equally correlating with the opposite question. Also taking the observation of (D/E) into consideration, it appears to be a valid conclusion that the respondents perceive these two questions similarly.
The same accounts for the combination of (A/C) with both asking about the quality of connection to friends their feelings respectively. Left with (B/E) this is most likely the most interesting question combination, looking at the correlation between feeling “worse about your own life because of what you see from other friends on social media” and feeling the pressure to “only post content that makes you look good to others”. This observation is strongly in line with the initially presented research discussed by the (Huffington Post).
The last interesting observation with regards to the variance of the values is combination (B/C). The observations vary from positively to negatively correlating, which might be explained with the following assumption. Some people might perceive it positively to be connected to friend’s feelings, while other might get jealous and therefore absorb it as negative feeling.
| 13 / 1 | 13 / 2 | 13 / 3 | 14 / 1 | 14 / 2 | 14 / 3 | 15 / 1 | 15 / 2 | 15 / 3 | 16 / 1 | 16 / 2 | 16 / 3 | 17 / 1 | 17 / 2 | 17 / 3 | Total / 1 | Total / 2 | Total / 3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| KFSNS1_A / KFSNS1_B | 0.0516398 | 0.3846784 | 0.4717945 | 0.1479999 | 0.2966666 | 0.1917819 | 0.2721525 | 0.0000000 | 0.2102082 | 0.1877500 | -0.0127336 | 0.1420755 | 0.4192235 | 0.2639285 | 0.2071129 | 0.2187846 | 0.2096518 | 0.2192974 |
| KFSNS1_A / KFSNS1_C | 0.6348111 | 0.3292377 | 0.6972201 | 0.6594522 | 0.3812483 | 0.6230366 | 0.5958645 | 0.3577667 | 0.5366378 | 0.5439614 | 0.6547480 | 0.5000234 | 0.6119803 | 0.3023899 | 0.4384855 | 0.5913736 | 0.4252869 | 0.5300244 |
| KFSNS1_A / KFSNS1_D | 0.0730297 | 0.3024592 | 0.3719387 | 0.2812593 | 0.1516595 | 0.1613196 | 0.6466862 | 0.0843100 | 0.0575739 | 0.3699212 | 0.3024685 | 0.1932108 | 0.5869392 | 0.4471260 | 0.3615233 | 0.4106855 | 0.2679292 | 0.2482770 |
| KFSNS1_A / KFSNS1_E | 0.0905204 | 0.2911625 | 0.4458151 | 0.3262676 | 0.2762352 | 0.1673923 | 0.4638176 | 0.2338437 | 0.0796735 | 0.3200560 | 0.3435784 | 0.1307670 | 0.4721025 | 0.2421106 | 0.3520961 | 0.3520948 | 0.2742922 | 0.2301222 |
| KFSNS1_B / KFSNS1_C | 0.0546358 | 0.1347215 | 0.4414921 | 0.0845305 | 0.4817815 | 0.1517081 | 0.2530332 | -0.2778478 | 0.1896936 | 0.1169195 | 0.0168685 | -0.0156214 | 0.4693199 | 0.2629336 | 0.1009589 | 0.2211450 | 0.1748126 | 0.1560581 |
| KFSNS1_B / KFSNS1_D | 0.2828427 | 0.5587090 | 0.4440790 | 0.2356196 | 0.5112117 | 0.3120814 | 0.2142467 | 0.2658240 | 0.3898544 | 0.3188903 | 0.2549993 | 0.4688595 | 0.6122175 | 0.4732173 | 0.4517372 | 0.3553603 | 0.4388870 | 0.4230241 |
| KFSNS1_B / KFSNS1_E | 0.3505839 | 0.5981939 | 0.4866933 | 0.2563280 | 0.5398889 | 0.3445345 | 0.5082681 | 0.4346152 | 0.4951565 | 0.2717626 | -0.0802082 | 0.5551718 | 0.5489222 | 0.3851070 | 0.6059660 | 0.3746980 | 0.4226368 | 0.5151954 |
| KFSNS1_C / KFSNS1_D | 0.0772667 | 0.2454460 | 0.3807225 | 0.0746289 | 0.3595861 | 0.3970945 | 0.4561245 | -0.0370788 | -0.0851180 | 0.4388917 | 0.4183272 | 0.3693921 | 0.3951147 | 0.3282660 | 0.3665716 | 0.3131372 | 0.3041539 | 0.3039020 |
| KFSNS1_C / KFSNS1_E | 0.1694432 | 0.1589863 | 0.3543374 | 0.2955529 | 0.4297937 | 0.3741709 | 0.2684246 | -0.0285673 | 0.0107082 | 0.3592339 | 0.2929676 | 0.2741161 | 0.4907480 | 0.3006247 | 0.2270299 | 0.3326682 | 0.2453354 | 0.2591845 |
| KFSNS1_D / KFSNS1_E | 0.7246316 | 0.8610940 | 0.8588140 | 0.5598016 | 0.8212396 | 0.4900980 | 0.6179144 | 0.4330970 | 0.7415153 | 0.5920741 | 0.3397062 | 0.7000992 | 0.7055671 | 0.6693286 | 0.5420921 | 0.6229606 | 0.6708097 | 0.6533829 |
In general the patterns follow the same patterns as observed with Facebook. Looking at the previously identified questions, it occurs that for (B/E) Instagram has slightly more extreme values but a weaker variance, while mean and median are almost equally with 0.44/0.43 and 0.477/0.49 respectively (Instagram first).
The situation for (B/C) is almost the same, with one extreme outlier having a negative correlation of -0.37 from the 15 year old with a lot of followers.
| 13 / 1 | 13 / 2 | 13 / 3 | 14 / 1 | 14 / 2 | 14 / 3 | 15 / 1 | 15 / 2 | 15 / 3 | 16 / 1 | 16 / 2 | 16 / 3 | 17 / 1 | 17 / 2 | 17 / 3 | Total / 1 | Total / 2 | Total / 3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| KFSNS1_A / KFSNS1_B | 0.0436297 | 0.0417581 | 0.3359233 | 0.1632913 | 0.4101742 | 0.1695592 | 0.2792294 | -0.1238308 | 0.0991363 | 0.1397770 | -0.0303962 | -0.0245855 | 0.2803764 | 0.3002550 | 0.2704166 | 0.1943251 | 0.1401146 | 0.2022345 |
| KFSNS1_A / KFSNS1_C | 0.4339499 | 0.4546438 | 0.4625492 | 0.4840292 | 0.6506496 | 0.6760726 | 0.5519507 | 0.4344339 | 0.3193931 | 0.5383012 | 0.5662867 | 0.4968170 | 0.6729902 | 0.4535374 | 0.5276848 | 0.5500473 | 0.5158331 | 0.5077804 |
| KFSNS1_A / KFSNS1_D | 0.0361288 | 0.1460639 | 0.2889428 | 0.3737521 | 0.2921744 | 0.2208773 | 0.3632369 | 0.2229968 | 0.0606827 | 0.2747647 | 0.3176045 | 0.1790780 | 0.4649438 | 0.2655643 | 0.4165311 | 0.3277107 | 0.2484446 | 0.2698159 |
| KFSNS1_A / KFSNS1_E | 0.0408575 | 0.1220238 | 0.3864364 | 0.2533514 | 0.2903800 | 0.2997621 | 0.2792805 | 0.1322345 | 0.1281495 | 0.2532932 | 0.2071700 | 0.1844705 | 0.4859575 | 0.4426072 | 0.2986332 | 0.2908485 | 0.2402167 | 0.2810403 |
| KFSNS1_B / KFSNS1_C | 0.3028489 | 0.0601385 | 0.1589191 | -0.1071400 | 0.1636634 | 0.0632226 | 0.2435441 | -0.3216153 | -0.3577114 | 0.1969940 | -0.3757346 | -0.0798723 | 0.3096067 | 0.4917931 | 0.2652439 | 0.1843109 | 0.0158371 | 0.0843420 |
| KFSNS1_B / KFSNS1_D | 0.2587746 | 0.4130694 | 0.4261093 | 0.4286926 | 0.0944911 | 0.4219567 | 0.4872515 | 0.4411096 | -0.1406208 | 0.3730673 | 0.3977058 | 0.3114168 | 0.6355361 | 0.4799409 | 0.4721240 | 0.4460334 | 0.3333948 | 0.3657947 |
| KFSNS1_B / KFSNS1_E | 0.2006700 | 0.6982875 | 0.4931470 | -0.0088546 | 0.3286879 | 0.5301507 | 0.5489356 | 0.4480552 | 0.1247741 | 0.4174060 | 0.5526203 | 0.2497625 | 0.5525851 | 0.5999262 | 0.6808509 | 0.3755384 | 0.5012750 | 0.4657734 |
| KFSNS1_C / KFSNS1_D | 0.1055927 | 0.1731589 | 0.1032796 | 0.1276814 | 0.0000000 | 0.2353065 | 0.0992174 | -0.2139714 | 0.0432158 | 0.4110687 | 0.2886751 | 0.3600124 | 0.3553139 | 0.4879500 | 0.4404410 | 0.2828148 | 0.1480700 | 0.2780105 |
| KFSNS1_C / KFSNS1_E | 0.3070621 | 0.0747407 | 0.2888356 | 0.2648561 | 0.2053960 | 0.2941331 | -0.0029764 | -0.2013684 | 0.1477591 | 0.2888636 | 0.0173032 | 0.3026761 | 0.5106006 | 0.4879500 | 0.3017766 | 0.2806577 | 0.1455873 | 0.3013274 |
| KFSNS1_D / KFSNS1_E | 0.7754626 | 0.8354141 | 0.7159396 | 0.6688146 | 0.4743416 | 0.5872483 | 0.7315274 | 0.6519678 | 0.6721083 | 0.5956790 | 0.6993010 | 0.7937821 | 0.5880541 | 0.7142857 | 0.6892864 | 0.6476180 | 0.6642034 | 0.7008922 |
## [1] "KFSNS1_A / \n KFSNS1_B"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.12380 0.04269 0.16330 0.15700 0.27980 0.41020
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.01273 0.14500 0.20710 0.21560 0.28440 0.47180
## [1] "KFSNS1_A / \n KFSNS1_C"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3194 0.4541 0.4968 0.5149 0.5591 0.6761
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3024 0.4099 0.5440 0.5245 0.6289 0.6972
## [1] "KFSNS1_A / \n KFSNS1_D"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.03613 0.20000 0.27480 0.26160 0.34040 0.46490
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05757 0.15650 0.30250 0.29280 0.37090 0.64670
## [1] "KFSNS1_A / \n KFSNS1_E"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04086 0.15840 0.25340 0.25360 0.29920 0.48600
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.07967 0.20060 0.29120 0.28240 0.34780 0.47210
## [1] "KFSNS1_B / \n KFSNS1_C"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.37570 -0.09351 0.15890 0.06759 0.25440 0.49180
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.27780 0.06958 0.13470 0.16430 0.25800 0.48180
## [1] "KFSNS1_B / \n KFSNS1_D"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.1406 0.3422 0.4220 0.3667 0.4566 0.6355
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2142 0.2743 0.3899 0.3863 0.4710 0.6122
## [1] "KFSNS1_B / \n KFSNS1_E"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.008855 0.289200 0.493100 0.427800 0.552600 0.698300
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.08021 0.34760 0.48670 0.42010 0.54440 0.60600
## [1] "KFSNS1_C / \n KFSNS1_D"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.2140 0.1012 0.1732 0.2011 0.3577 0.4880
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.08512 0.16140 0.36660 0.27900 0.39610 0.45610
## [1] "KFSNS1_C / \n KFSNS1_E"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.2014 0.1112 0.2888 0.2192 0.3022 0.5106
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.02857 0.19820 0.29300 0.26520 0.35680 0.49070
## [1] "KFSNS1_D / \n KFSNS1_E"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4743 0.6238 0.6893 0.6795 0.7237 0.8354
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3397 0.5509 0.6693 0.6438 0.7331 0.8611
Given the initial size of the entire set and the necessary cleaning, the cluster ended up being relatively small to make statistically significant observations. Furthermore the selected independent variables number of friends/followers and age must be evaluated under the following assumptions. Looking at dependent variables that are deeply connected to the psyche of human beings, given the age and the along going presumable changes in life of the respondents, using only these two variables is a strong reduction of complexity to explain highly connected and complex occurrences that are influenced by numerous factors. Second, given the time restriction and scope of the project a deeper analysis was simply not possible. The overall aim of this analysis was to show how clustering by relevant parameters might lead to different results and insights, which will be quickly recaptured.
Starting with the initial heat maps, that only had limited explanatory power, the clustering by tercile of friends and age led to more detailed and differentiated results, clearly showing that the correlations between the responses to the questions are also relying on the cluster criteria. Furthermore as discussed above, there is a correlation between feeling “worse about your own life because of what you see from other friends on social media” and feeling the pressure to “only post content that makes you look good to others”. This observation addresses exactly what the author of the article initially describes with regards to her experience. Given the mentioned limitations regarding the dataset as well as the complexity of the human mind, the hypothesis must be rejected from a statistical point of view, nevertheless the observation showed as well, that exposure expressed in number of friends has an impact on the perception of social media.
The investigation of dataset had revealed several interesting relationships. Firstly, parents’ behaviour seem to affect teen’s behaviours. In particular, parents who stalk their teen by monitoring his/her location, has a positive effect on the teen also monitoring his/her significant other’s location or accessing his/her phone. Secondly, the sample data showed differences between male and female teen’s extent of social media usage. Female teens have more friends on Facebook and more followers on Instagram than male teens. Female teens also use more types of social media. Female teens tend to unfriend/block her ex than male teens. The data also suggests that social media and electronic devices help teens to communicate or get along with their friends. Lastly, clustering of responses to various survey questions showed that how social media impacted one’s self-perception depended not only on the number of friends the teen had, but also their age. The older the teen, the more likely he/she would be able to comprehend questions that required more self-reflection. This hypothesis was supported by data which showed more variance by age for questions like “whether you feel worse about your own life because of what you see from other friends on social media”.
Nonetheless, the dataset was relatively small (especially after data cleaning and given the number of variables studied), and hence a larger sample dataset would be required to have more conclusive results. In fact, if time permitted, studies could be done to see if trends were consistent over years (i.e. different cohorts of 13-17 year old teens), and other analysis could include investigating if such trends would persist beyond teenage years into adulthood. For example, longitudinal studies could be done to track whether the surveyed teens’ responses to the same questions, such as feeling pressure to appear good, persist when they are in their 30s (i.e. it could be studied whether parents’ influences had ‘permanent’ impact on the teen, or whether the impact was transient during teen years).